55 research outputs found
An Information-Theoretic Analysis of Deduplication
Deduplication finds and removes long-range data duplicates. It is commonly
used in cloud and enterprise server settings and has been successfully applied
to primary, backup, and archival storage. Despite its practical importance as a
source-coding technique, its analysis from the point of view of information
theory is missing. This paper provides such an information-theoretic analysis
of data deduplication. It introduces a new source model adapted to the
deduplication setting. It formalizes the two standard fixed-length and
variable-length deduplication schemes, and it introduces a novel multi-chunk
deduplication scheme. It then provides an analysis of these three deduplication
variants, emphasizing the importance of boundary synchronization between source
blocks and deduplication chunks. In particular, under fairly mild assumptions,
the proposed multi-chunk deduplication scheme is shown to be order optimal.Comment: 27 page
The Approximate Capacity of the Gaussian N-Relay Diamond Network
We consider the Gaussian "diamond" or parallel relay network, in which a
source node transmits a message to a destination node with the help of N
relays. Even for the symmetric setting, in which the channel gains to the
relays are identical and the channel gains from the relays are identical, the
capacity of this channel is unknown in general. The best known capacity
approximation is up to an additive gap of order N bits and up to a
multiplicative gap of order N^2, with both gaps independent of the channel
gains.
In this paper, we approximate the capacity of the symmetric Gaussian N-relay
diamond network up to an additive gap of 1.8 bits and up to a multiplicative
gap of a factor 14. Both gaps are independent of the channel gains and, unlike
the best previously known result, are also independent of the number of relays
N in the network. Achievability is based on bursty amplify-and-forward, showing
that this simple scheme is uniformly approximately optimal, both in the
low-rate as well as in the high-rate regimes. The upper bound on capacity is
based on a careful evaluation of the cut-set bound. We also present
approximation results for the asymmetric Gaussian N-relay diamond network. In
particular, we show that bursty amplify-and-forward combined with optimal relay
selection achieves a rate within a factor O(log^4(N)) of capacity with
pre-constant in the order notation independent of the channel gains.Comment: 23 pages, to appear in IEEE Transactions on Information Theor
Tracking Stopping Times Through Noisy Observations
A novel quickest detection setting is proposed which is a generalization of
the well-known Bayesian change-point detection model. Suppose
\{(X_i,Y_i)\}_{i\geq 1} is a sequence of pairs of random variables, and that S
is a stopping time with respect to \{X_i\}_{i\geq 1}. The problem is to find a
stopping time T with respect to \{Y_i\}_{i\geq 1} that optimally tracks S, in
the sense that T minimizes the expected reaction delay E(T-S)^+, while keeping
the false-alarm probability P(T<S) below a given threshold \alpha \in [0,1].
This problem formulation applies in several areas, such as in communication,
detection, forecasting, and quality control.
Our results relate to the situation where the X_i's and Y_i's take values in
finite alphabets and where S is bounded by some positive integer \kappa. By
using elementary methods based on the analysis of the tree structure of
stopping times, we exhibit an algorithm that computes the optimal average
reaction delays for all \alpha \in [0,1], and constructs the associated optimal
stopping times T. Under certain conditions on \{(X_i,Y_i)\}_{i\geq 1} and S,
the algorithm running time is polynomial in \kappa.Comment: 19 pages, 4 figures, to appear in IEEE Transactions on Information
Theor
Fundamental Limits of Caching
Caching is a technique to reduce peak traffic rates by prefetching popular
content into memories at the end users. Conventionally, these memories are used
to deliver requested content in part from a locally cached copy rather than
through the network. The gain offered by this approach, which we term local
caching gain, depends on the local cache size (i.e, the memory available at
each individual user). In this paper, we introduce and exploit a second,
global, caching gain not utilized by conventional caching schemes. This gain
depends on the aggregate global cache size (i.e., the cumulative memory
available at all users), even though there is no cooperation among the users.
To evaluate and isolate these two gains, we introduce an
information-theoretic formulation of the caching problem focusing on its basic
structure. For this setting, we propose a novel coded caching scheme that
exploits both local and global caching gains, leading to a multiplicative
improvement in the peak rate compared to previously known schemes. In
particular, the improvement can be on the order of the number of users in the
network. Moreover, we argue that the performance of the proposed scheme is
within a constant factor of the information-theoretic optimum for all values of
the problem parameters.Comment: To appear in IEEE Transactions on Information Theor
Energy-Efficient Communication over the Unsynchronized Gaussian Diamond Network
Communication networks are often designed and analyzed assuming tight
synchronization among nodes. However, in applications that require
communication in the energy-efficient regime of low signal-to-noise ratios,
establishing tight synchronization among nodes in the network can result in a
significant energy overhead. Motivated by a recent result showing that
near-optimal energy efficiency can be achieved over the AWGN channel without
requiring tight synchronization, we consider the question of whether the
potential gains of cooperative communication can be achieved in the absence of
synchronization. We focus on the symmetric Gaussian diamond network and
establish that cooperative-communication gains are indeed feasible even with
unsynchronized nodes. More precisely, we show that the capacity per unit energy
of the unsynchronized symmetric Gaussian diamond network is within a constant
factor of the capacity per unit energy of the corresponding synchronized
network. To this end, we propose a distributed relaying scheme that does not
require tight synchronization but nevertheless achieves most of the energy
gains of coherent combining.Comment: 20 pages, 4 figures, submitted to IEEE Transactions on Information
Theory, presented at IEEE ISIT 201
Computation Alignment: Capacity Approximation without Noise Accumulation
Consider several source nodes communicating across a wireless network to a
destination node with the help of several layers of relay nodes. Recent work by
Avestimehr et al. has approximated the capacity of this network up to an
additive gap. The communication scheme achieving this capacity approximation is
based on compress-and-forward, resulting in noise accumulation as the messages
traverse the network. As a consequence, the approximation gap increases
linearly with the network depth.
This paper develops a computation alignment strategy that can approach the
capacity of a class of layered, time-varying wireless relay networks up to an
approximation gap that is independent of the network depth. This strategy is
based on the compute-and-forward framework, which enables relays to decode
deterministic functions of the transmitted messages. Alone, compute-and-forward
is insufficient to approach the capacity as it incurs a penalty for
approximating the wireless channel with complex-valued coefficients by a
channel with integer coefficients. Here, this penalty is circumvented by
carefully matching channel realizations across time slots to create
integer-valued effective channels that are well-suited to compute-and-forward.
Unlike prior constant gap results, the approximation gap obtained in this paper
also depends closely on the fading statistics, which are assumed to be i.i.d.
Rayleigh.Comment: 36 pages, to appear in IEEE Transactions on Information Theor
Decentralized Coded Caching Attains Order-Optimal Memory-Rate Tradeoff
Replicating or caching popular content in memories distributed across the
network is a technique to reduce peak network loads. Conventionally, the main
performance gain of this caching was thought to result from making part of the
requested data available closer to end users. Instead, we recently showed that
a much more significant gain can be achieved by using caches to create
coded-multicasting opportunities, even for users with different demands,
through coding across data streams. These coded-multicasting opportunities are
enabled by careful content overlap at the various caches in the network,
created by a central coordinating server.
In many scenarios, such a central coordinating server may not be available,
raising the question if this multicasting gain can still be achieved in a more
decentralized setting. In this paper, we propose an efficient caching scheme,
in which the content placement is performed in a decentralized manner. In other
words, no coordination is required for the content placement. Despite this lack
of coordination, the proposed scheme is nevertheless able to create
coded-multicasting opportunities and achieves a rate close to the optimal
centralized scheme.Comment: To appear in IEEE/ACM Transactions on Networkin
- …